AITopics | neural text-to-speech

Collaborating Authors

neural text-to-speech

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large-Scale Automatic Audiobook Creation

Walsh, Brendan, Hamilton, Mark, Newby, Greg, Wang, Xi, Ruan, Serena, Zhao, Sheng, He, Lei, Zhang, Shaofei, Dettinger, Eric, Freeman, William T., Weimer, Markus

arXiv.org Artificial IntelligenceSep-7-2023

An audiobook can dramatically improve a work of literature's accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to create and release thousands of human-quality, open-license audiobooks from the Project Gutenberg e-book collection. Our method can identify the proper subset of e-book content to read for a wide collection of diversely structured books and can operate on hundreds of books in parallel. Our system allows users to customize an audiobook's speaking speed and style, emotional intonation, and can even match a desired voice using a small amount of sample audio. This work contributed over five thousand open-license audiobooks and an interactive demo that allows users to quickly create their own customized audiobooks. To listen to the audiobook collection visit \url{https://aka.ms/audiobook}.

audiobook, high-quality audiobook, project gutenberg, (11 more...)

arXiv.org Artificial Intelligence

2309.03926

Genre: Research Report (0.40)

Industry: Media > Publishing (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.39)

Add feedback

GitHub - jaketae/storyteller: Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech

#artificialintelligenceJan-2-2023, 22:00:42 GMT

A multimodal AI story teller, built with Stable Diffusion, GPT, and neural text-to-speech (TTS). Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals.

multimodal ai story teller, neural text-to-speech, stable diffusion, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.75)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.75)
Information Technology > Artificial Intelligence > Assistive Technologies (0.75)

Add feedback

Enhancing audio quality for expressive Neural Text-to-Speech

Ezzerg, Abdelhamid, Gabrys, Adam, Putrycz, Bartosz, Korzekwa, Daniel, Saez-Trigueros, Daniel, McHardy, David, Pokora, Kamil, Lachowicz, Jakub, Lorenzo-Trueba, Jaime, Klimkov, Viacheslav

arXiv.org Artificial IntelligenceAug-13-2021

Artificial speech synthesis has made a great leap in terms of naturalness as recent Text-to-Speech (TTS) systems are capable of producing speech with similar quality to human recordings. However, not all speaking styles are easy to model: highly expressive voices are still challenging even to recent TTS architectures since there seems to be a trade-off between expressiveness in a generated audio and its signal quality. In this paper, we present a set of techniques that can be leveraged to enhance the signal quality of a highly-expressive voice without the use of additional data. The proposed techniques include: tuning the autoregressive loop's granularity during training; using Generative Adversarial Networks in acoustic modeling; and Figure 1: Overview of model architecture. The system can be the use of Variational Auto-Encoders in both the acoustic model broken into two parts: an acoustic model and a neural vocoder and the neural vocoder. We show that, when combined, these that produces waveform. Orange blocks highlight the building techniques greatly closed the gap in perceived naturalness between neural network blocks for the acoustic model while the neural the baseline system and recordings by 39% in terms of vocoder is represented by a blue box.

acoustic model, architecture, latent representation, (15 more...)

arXiv.org Artificial Intelligence

2108.0627

Genre: Research Report (0.43)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AWS Polly gains neural voices in U.S. Spanish and Brazilian Portuguese

#artificialintelligenceOct-23-2019, 19:42:21 GMT

Months after Amazon launched in general availability Neural Text-To-Speech (NTTS) and newscaster style in Amazon Polly, a cloud service that converts text into speech, the Seattle company today debuted two new NTTS voices in U.S. Spanish and Brazilian Portuguese: "Lupe" and "Camila." Like the U.S. English NTTS voice before them, they mimic things like stress and intonation in speech courtesy by identifying tonal patterns. Neural versions of Camila and Lupe are available in Amazon Web Services' (AWS) U.S. East (N. Standard variants are also available across 18 AWS regions, bringing Polly's total number of voices to 61 across 29 languages and the total number of voices available in both standard and neural versions to 13 across four languages. According to Amazon text-to-speech program manager Marta Smolarek, the new U.S. Spanish voice -- Lupe, which is the third U.S. text-to-speech voice in Polly -- not only speaks Spanish but also handles English and provides a fully bilingual Spanish-English experience.

aw polly gain neural voice, brazilian portuguese, sequence, (11 more...)

#artificialintelligence

Country:

North America > United States > Virginia (0.06)
North America > United States > Oregon (0.06)
Europe > Ireland (0.06)

Industry: Information Technology > Services (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.54)

Add feedback

Alexa is now programmed to sound like a real-life news anchor

Daily Mail - Science & techJan-21-2019, 14:05:16 GMT

Amazon Alexa has been programmed to read the news headlines in the style of a newsreader. The popular voice assistant will now emphasise words, and mimic the intonation and pace of a TV anchor to present the news in a more natural way. Newsreader Alexa has been trained to read the daily bulletins when the user says'Alexa, what's the latest?' Amazon Alexa has been programmed to read the news headlines in the style of a newsreader. The virtual assistant already was able to read out the headlines but using the traditional robotic voice. Amazon conducted tests and found that people preferred hearing the news in this more realistic and listener friendly manner, compared to the robotic tone.

alexa, artificial intelligence, natural language, (18 more...)

Daily Mail - Science & tech

Industry:

Information Technology > Services (0.52)
Media > News (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
Information Technology > Communications > Social Media (0.70)

Add feedback